Minimum Splits Based Discretization for Continuous Features
نویسندگان
چکیده
Discretization refers to splitting the range of continuous values into intervals so as to pro vide useful information about classes This is usually done by minimizing a goodness mea sure subject to constraints such as the maxi mal number of intervals the minimal number of examples per interval or some stopping cri terion for splitting We take a di erent ap proach by searching for minimum splits that minimize the number of intervals with respect to a threshold of impurity i e badness We propose a total entropy motivated selection of the best split from minimum splits with out requiring additional constraints Experi ments show that the proposed method produces better decision trees
منابع مشابه
A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملError-Based and Entropy-Based Discretization of Continuous Features
We present a comparison of error-based and entropybased methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorithm and compare it to an existing entropy-based ...
متن کاملMulti-Interval Discretization of Continuous-Valued Attributes for Classification Learning
Since most real-world applications of classification learning involve continuous-valued attributes, properly addressing the discretization process is an important problem. This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals. We briefly present theoretical evidence for the appropriateness of this h...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملCompression-Based Discretization of Continuous Attributes
Discretization of continuous attributes into ordered discrete attributes can be beneecial even for propositional induction algorithms that are capable of handling continuous attributes directly. Beneets include possibly large improvements in induction time, smaller sizes of induced trees or rule sets, and even improved predictive accuracy. We deene a global evaluation measure for discretization...
متن کامل